Search CORE

12 research outputs found

Abstractive spoken document summarization using hierarchical model with multi-stage attention diversity optimization

Author: Gales MJF
Manakul P
Wang L
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2020
Field of study

Abstractive summarization is a standard task for written documents, such as news articles. Applying summarization schemes to spoken documents is more challenging, especially in situations involving human interactions, such as meetings. Here, utterances tend not to form complete sentences and sometimes contain little information. Moreover, speech disfluencies will be present as well as recognition errors for automated systems. For current attention-based sequence-to-sequence summarization systems, these additional challenges can yield a poor attention distribution over the spoken document words and utterances, impacting performance. In this work, we propose a multi-stage method based on a hierarchical encoder-decoder model to explicitly model utterance-level attention distribution at training time; and enforce diversity at inference time using a unigram diversity term. Furthermore, multitask learning tasks including dialogue act classification and extractive summarization are incorporated. The performance of the system is evaluated on the AMI meeting corpus. The inclusion of both training and inference diversity terms improves performance, outperforming current state-of-the-art systems in terms of ROUGE scores. Additionally, the impact of ASR errors, as well as performance on the multitask learning tasks, is evaluated

Crossref

Apollo (Cambridge)

Impact of ASR performance on spoken grammatical error detection

Author: Gales MJF
Knill KM
Lu Y
Manakul P
Wang L
Wang Y
Publication venue: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Publication date: 01/01/2019
Field of study

Computer assisted language learning (CALL) systems aidlearners to monitor their progress by providing scoring andfeedback on language assessment tasks. Free speaking tests al-low assessment of what a learner has said, as well as how theysaid it. For these tasks, Automatic Speech Recognition (ASR)is required to generate transcriptions of a candidate’s responses,the quality of these transcriptions is crucial to provide reliablefeedback in downstream processes. This paper considers theimpact of ASR performance on Grammatical Error Detection(GED) for free speaking tasks, as an example of providing feed-back on a learner’s use of English. The performance of an ad-vanced deep-learning based GED system, initially trained onwritten corpora, is used to evaluate the influence of ASR errors.One consequence of these errors is that grammatical errors canresult from incorrect transcriptions as well as learner errors, thismay yield confusing feedback. To mitigate the effect of theseerrors, and reduce erroneous feedback, ASR confidence scoresare incorporated into the GED system. By additionally adaptingthe written text GED system to the speech domain, using ASRtranscriptions, significant gains in performance can be achieved.Analysis of the GED performance for different grammatical er-ror types and across grade is also presented.ALT

Crossref

Apollo (Cambridge)

Recommended from our members

Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems

Author: Gales MJF
Manakul P
Publication venue: EMNLP 2021 - 2021 Conference on Empirical Methods in Natural Language Processing, Proceedings
Publication date: 08/09/2021
Field of study

Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one important bottleneck, the quadratic self-attention mechanism in the encoder. Modified encoder architectures such as LED or LoBART use local attention patterns to address this problem for summarization. In contrast, this work focuses on the transformer's encoder-decoder attention mechanism. The cost of this attention becomes more significant in inference or training approaches that require model-generated histories. First, we examine the complexity of the encoder-decoder attention. We demonstrate empirically that there is a sparse sentence structure in document summarization that can be exploited by constraining the attention mechanism to a subset of input sentences, whilst maintaining system performance. Second, we propose a modified architecture that selects the subset of sentences to constrain the encoder-decoder attention. Experiments are carried out on abstractive summarization tasks, including CNN/DailyMail, XSum, Spotify Podcast, and arXiv

Apollo (Cambridge)

Long-span summarization via local attention and content selection

Author: Gales MJF
Manakul P
Publication venue: ACL-IJCNLP 2021 - 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Proceedings of the Conference
Publication date: 09/05/2021
Field of study

Transformer-based models have achieved state-of-the-art results in a wide range of natural language processing (NLP) tasks including document summarization. Typically these systems are trained by fine-tuning a large pre-trained model to the target task. One issue with these transformer-based models is that they do not scale well in terms of memory and compute requirements as the input length grows. Thus, for long document summarization, it can be challenging to train or fine-tune these models. In this work, we exploit large pre-trained transformer-based models and address long-span dependencies in abstractive summarization using two methods: local self-attention; and explicit content selection. These approaches are compared on a range of network configurations. Experiments are carried out on standard long-span summarization tasks, including Spotify Podcast, arXiv, and PubMed datasets. We demonstrate that by combining these methods, we can achieve state-of-the-art results on all three tasks in the ROUGE scores. Moreover, without a large-scale GPU card, our approach can achieve comparable or better results than existing approaches.1. ALTA institute, Cambridge Assessment English, University of Cambridge 2. Cambridge International & St John’s College Scholarshi

arXiv.org e-Print Archive

Apollo (Cambridge)

Long-Span Summarization via Local Attention and Content Selection

Author: Gales MJF
Manakul P
Publication venue
Publication date
Field of study

CUED - Cambridge University Engineering Department

Abstractive spoken document summarization using hierarchical model with multi-stage attention diversity optimization

Author: Gales MJF
Manakul P
Wang L
Publication venue
Publication date: 01/01/2020
Field of study

CUED - Cambridge University Engineering Department

Disfluency Detection for Spoken Learner English

Author: Gales MJF
Knill KM
Lu Y
Manakul P
Wang Y
Publication venue: 8th ISCA Workshop on Speech and Language Technology in Education, SLaTE 19
Publication date: 01/01/2019
Field of study

One of the challenges for computer aided language learning (CALL) is providing high quality feedback to learners. An obstacle to improving feedback is the lack of labelled training data for tasks such as spoken”grammatical” error detection and correction, both of which provide important features that can be used in downstream feedback systems One approach to addressing this lack of data is to convert the output of an automatic speech recognition (ASR) system into a form that is closer to text data, for which there is significantly more labelled data available. Disfluency detection, locating regions of the speech where for example false starts and repetitions occur, and subsequent removal of the associated words, helps to make speech transcriptions more text-like. Additionally, ASR systems do not usually generate sentence-like units, the output is simply a sequence of words associated with the particular speech segmentation used for coding. This motivates the need for automated systems for sentence segmentation. By combining these approaches, advanced text processing techniques should perform significantly better on the output from spoken language processing systems. Unfortunately there is not enough labelled data available to train these systems on spoken learner English. In this work disfluency detection and”sentence” segmentation systems trained on data from native speakers are applied to spoken grammatical error detection and correction tasks for learners of English. Performance gains using these approaches are shown on a free speaking test

Crossref

Apollo (Cambridge)

Impact of ASR performance on spoken grammatical error detection

Author: Gales MJF
Knill KM
Lu Y
Manakul P
Wang L
Wang Y
Publication venue
Publication date: 01/01/2019
Field of study

Computer assisted language learning (CALL) systems aid learners to monitor their progress by providing scoring and feedback on language assessment tasks. Free speaking tests allow assessment of what a learner has said, as well as how they said it. For these tasks, Automatic Speech Recognition (ASR) is required to generate transcriptions of a candidate's responses, the quality of these transcriptions is crucial to provide reliable feedback in downstream processes. This paper considers the impact of ASR performance on Grammatical Error Detection (GED) for free speaking tasks, as an example of providing feedback on a learner's use of English. The performance of an advanced deep-learning based GED system, initially trained on written corpora, is used to evaluate the influence of ASR errors. One consequence of these errors is that grammatical errors can result from incorrect transcriptions as well as learner errors, this may yield confusing feedback. To mitigate the effect of these errors, and reduce erroneous feedback, ASR confidence scores are incorporated into the GED system. By additionally adapting the written text GED system to the speech domain, using ASR transcriptions, significant gains in performance can be achieved. Analysis of the GED performance for different grammatical error types and across grade is also presented

CUED - Cambridge University Engineering Department

Disfluency Detection for Spoken Learner English

Author: Gales M
Knill K
Lu Y
Manakul P
Wang Y
Publication venue
Publication date
Field of study

One of the challenges for computer aided language learn-ing (CALL) is providing high quality feedback to learners. Anobstacle to improving feedback is the lack of labelled trainingdata for tasks such as spoken ”grammatical” error detection andcorrection, both of which provide important features that canbe used in downstream feedback systems One approach to ad-dressing this lack of data is to convert the output of an auto-matic speech recognition (ASR) system into a form that is closerto text data, for which there is significantly more labelled dataavailable. Disfluency detection, locating regions of the speechwhere for example false starts and repetitions occur, and subse-quent removal of the associated words, helps to make speechtranscriptions more text-like. Additionally, ASR systems donot usually generate sentence-like units, the output is simplya sequence of words associated with the particular speech seg-mentation used for coding. This motivates the need for auto-mated systems for sentence segmentation. By combining theseapproaches, advanced text processing techniques should per-form significantly better on the output from spoken languageprocessing systems. Unfortunately there is not enough labelleddata available to train these systems on spoken learner English.In this work disfluency detection and ”sentence” segmentationsystems trained on data from native speakers are applied to spo-ken grammatical error detection and correction tasks for learn-ers of English. Performance gains using these approaches areshown on a free speaking test

CUED - Cambridge University Engineering Department

Roles of gender in the relationship between EFL learners’ perceptions of the context and their speaking achievement

Crossref